Dimensionality Reduction for Classification with High-dimensional Data
نویسندگان
چکیده
This thesis addresses dimensionality reduction problems in classification for both high-dimensional multivariate and functional data. High-dimensional data refers to data with a large number of variables, often larger than the number of observations. High-dimensional data are encountered in a wide range of areas such as engineering, biometrics, psychometrics, and neuroimaging. Classifying these data is a difficult problem because the enormous number of variables poses challenges to conventional classification methods and renders many classical techniques impractical. A natural solution is to add a dimensionality reduction step before a classification technique is applied. In order to deal with multivariate data, two approaches are proposed. One is a simulated annealing (SA) based method and the other is a multivariate adaptive stochastic search (MASS) method. They both utilize stochastic search algorithms to select a handful of optimal transformation directions from a large number of random directions in each iteration. One advantage of the proposed methods is that they can accurately project the data onto very low-dimensional non-linear, as well as linear, spaces. These methods are designed to mimic variable selection type methods, such as the Lasso, or variable combination methods, such as PCA, or a method that combines the two approaches. Particularly, MASS can adaptively adjust the model complexity level, and hence performs well in situations where variable selection or variable combination x methods fail. We demonstrate the strengths of SA and MASS on an extensive range of simulation and real studies by comparing them to many classical and modern classification methods. Classification problems associated with functional data are also addressed. We propose a functional adaptive classification (FAC) approach which takes the functional response into consideration and produces highly accurate and interpretable results. FAC is also based on a stochastic search procedure guided by the evaluation of model complexity. This often results in a simple relationship between functional covariates and the reduced data and makes the model interpretable. Simulation studies and an fMRI time course study are also provided to show the effectiveness of the proposed method.
منابع مشابه
A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کامل2D Dimensionality Reduction Methods without Loss
In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...
متن کاملانجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی
Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...
متن کاملSupervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملHyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations
The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009